Cached Dram Adds a Small Cache onto a Dram Chip to Reduce Average Dram Access Latency. the Authors Compare Cached Dram with Other Advanced Dram Techniques for Reducing Memory
نویسندگان
چکیده
As the speed gap between processor and memory widens, data-intensive applications such as commercial workloads increase demands on main memory systems. Consequently, memory stall time—both latency time and bandwidth time—can increase dramatically, significantly impeding the performance of these applications. DRAM latency, or the minimum time for a DRAM to physically read or write data, mainly determines latency time. The data transfer rate through the memory bus determines bandwidth time. Burger, Goodman, and Kägi show that memory bandwidth is a major performance bottleneck in memory systems. More recently, Cuppu et al. indicate that with improvements in bus technology, the most advanced memory systems, such as synchronous DRAM (SDRAM), enhanced SDRAM, and Rambus DRAM, have significantly reduced bandwidth time. However, DRAM speed has improved little. DRAM speed is a major factor in determining memory stall time, which significantly affects the performance of data-intensive applications such as commercial workloads. In a cached DRAM, a small or on-memory cache is added onto the DRAM core. The onmemory cache exploits the locality that appears on the main memory side. The DRAM core can transfer a large block of data to the on-memory cache in one DRAM cycle. This data block can be several dozen times larger than an L2 cache line. The on-memory cache takes advantage of the DRAM chip’s high internal bandwidth, which can be as high as a few hundred gigabytes per second. Hsu and Smith classify cached DRAM organizations into two groups: those where the on-memory cache contains only a single large line buffering an entire row of the memory array, and those where the on-memory cache contains multiple regular data cache lines organized as direct-mapped or set-associative structures. In a third class combining these two forms, the on-memory cache contains multiple large cache lines buffering multiple rows of the memory array organized as direct-mapped or set-associative structures. Our work and other related studies belong to this third class. Cached DRAM improves memory access efficiency for technical workloads on a relatively simple processor model with small data caches (and in some cases, even without data caches). In a modern computer system, the CPU is a complex instruction-level parallelism Zhao Zhang, Zhichun Zhu, and Xiaodong Zhang
منابع مشابه
A Single Chip Multiprocessor Integrated with High Density DRAM
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more c...
متن کاملDRAM Caching
This paper presents methods to reduce memory latency in the main memory subsystem below the board-level cache. We consider conventional page-mode DRAMs and cached DRAMs. Evaluation is performed via trace-driven simulation of a suite of nine benchmarks. In the case of page-mode DRAMs we show that it can be detrimental to use page-mode naively. We propose two enhancements that reduce overall memo...
متن کاملReducing On-Chip DRAM Energy via Data Transfer Size Optimization
This paper proposes a software-controllable variable linesize (SC-VLS) cache architecture for low power embedded systems. High bandwidth between logic and a DRAM is realized by means of advanced integrated technology. System-in-Silicon is one of the architectural frameworks to realize the high bandwidth. An ASIC and a specific SRAM are mounted onto a silicon interposer. Each chip is connected t...
متن کاملEnabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions
Performance improvements from DRAM technology scaling have been lagging behind the improvements from logic technology scaling for many years. As application demand for main memory continues to grow, DRAM-based main memory is increasingly becoming a larger system bottleneck in terms of both performance and energy consumption. A major reason for poor memory performance and energy efficiency is me...
متن کاملDRAM Aware Last-Level-Cache Policies for Multi-core Systems
x latency DTC in two cycles. In contrast, state-of-the-art DRAM cache always reads the tags from DRAM cache that incurs high tag lookup latencies of up to 41 cycles. In summary, high DRAM cache hit latencies, increased inter-core interference, increased inter-core cache eviction, and the large application footprint of complex applications necessitates efficient policies in order to satisfy the ...
متن کامل